منابع مشابه
Chapter 2 Duplicate Record Detection Using Anfis
The problem of duplicate detection is to find out whether the same real-world object is represented by two or more distinct entries in the database. Duplicate detection is otherwise known as Record linkage or record matching. It is a greatly researched topic and is of vital importance in fields such as master data management, data warehousing and ETL (Extraction, Transformation and Loading), cu...
متن کاملTA-DRD: A Three-step Automatic Duplicate Record Detection
Duplicate record detection is a key step in Deep Web data integration, but the existing approaches do not adapt to its large-scale nature. In this paper, a three-step automatic approach is proposed for duplicate record detection in Deep Web. It firstly uses cluster ensemble to select initial training instance. Then it utilizes tri-training classification to construct classification model. Final...
متن کاملChapter 3 Duplicate Record Detection Using Ga and Pso
The present chapter extends the research discussed in chapter 2 by handling the optimization algorithms. Moises G. de Carvalho et al (2011) have proposed a genetic programming approach to record deduplication. This approach automatically proposes duplicate record detection function by combining several pieces of evidence taken from the data. This function makes it possible to identify whether t...
متن کاملA Survey on Duplicate Detection in Hierarchical Data
Although there has been a lot work done on identifying duplicates in relational data, but only a few solutions focus on identifying duplicates in more complex hierarchical structures, like XML data. In this paper, we have demonstrated the novel method for XML duplicate detection, called XMLDup. XMLDup method implements the Bayesian network to calculate and determine the probability of two XML n...
متن کاملA Survey of Duplicate And Near Duplicate Techniques
--World Wide Web consists of more than 50 billion pages online. The advent of the World Wide Web caused a dramatic increase in the usage of the Internet. The World Wide Web is a broadcast medium where a wide range of information can be obtained at a low cost. A great deal of the Web is replicate or nearreplicate content. Documents may be served in different formats: HTML, PDF, and Text for diff...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering
سال: 2007
ISSN: 1041-4347
DOI: 10.1109/tkde.2007.250581